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Abstract 

In this article, it is described how to use statistical data analysis to 
obtain models directly from data. The focus is put on finding nonlineari- 
ties within a generalized additive model. These models are found by the 
means of backfitting algorithms or more general versions, like the alter- 
nating conditional expectation value method. The method is illustrated 
by numerically generated data. As an application the example of vortex 
ripple dynamics, a highly complex fluid-granular system is treated. 



1 Introduction 

One of the major goals of spatiotemporal data analysis consists in inferring a 
model from a given spatiotemporal data set. Conventional ways of modeling 
are purely theoretical arguing and a posteriori evaluation of possible models 
with suitable measures. In this article, statistical methods and applications 
to infer generalized additive models from data are presented. The attention 
is put on spatio-temporal systems, but there are certainly more possibilities of 
using the ideas in physics and engineering. Generalized additive models have 
been introduced to statistical analysis in the 1980's [[[} ||. The development of 
solving algorithms and mathematical proofs has been developed contemporarily 
(see (2| and references therein). 

During the last two decades, natural sciences profited a lot by the increasing 
computer power of modern machines. In nonlinear and statistical sciences, old 
techniques could be exploited to their full extent and new methods have been 
developed. The identification of nonlinearities is one of the important topics in 
data analysis of. e.g., chaotic or pattern forming systems |j, Qj. This in turn is 
feasible by statistical tools. 
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Careful application of algorithms and thorough interpretation of results leads 
to a direct way to obtain models from data. Hereafter, the basic ideas of the 
techniques to solve for a generalized additive model are explained and an ap- 
plication to spatio-temporal data is given. The intention is not to give a full 
overview of statistical methods, nor to report an experiment in detail. Rather, 
it is shown how a consistent procedure to infer models from data can be used as 
a feedback to theory which in turn can motivate experimental designs to yield 
new data and so on. 

The paper is structured as follows: In Sec. || the algorithmic solutions for gen- 
eralized additive models are pointed out. This is done by the example of ACE, 
the Alternating Conditional Expectation value algorithm, which uses the back- 
fitting technique. This presentation is given some room; even though nothing 
new for statisticians, technical points need some explanation to be understood. 
Sec. [| provides more information to be used in the context of spatio-temporal 
data analysis. In Sec. || problems arising from preprocessing of data are dis- 
cussed along with the pattern formation example of vortex ripple dynamics. 
The paper concludes with a short discussion in Sec. @. 



2 Generalized additive models and backfitting 

The standard tool of data analysis for physicists and engineers is the multivariate 
linear regression, where one fits the model 

M 

U Q = C + ^a i U i + e (1) 

i=l 

to experimental data. The symbols e and Ui (upper case letters), i = 0, . . . , M 
denote a random process with given distribution, C is a constant and the aj 
are model parameters. One measures a realization Uj (lower case letters), i — 
0, . . . , M of the random process and finds estimates cfj for the parameters, e.g., 
by least squares fitting Throughout the text, estimations are denoted by a 
hat, only where ambiguity exists, otherwise additional symbols are omitted. 

There are many ways of generalization. One often used way is the generalized 
linear regression model, 

M 

U = C + ^a i f i {U i ) + e. (2) 
i=i 

In this model, the data analyst chooses beforehand the functions fa due to 
some prior knowledge about the system or simply by guessing. Then, a linear 
regression procedure yields estimates for the parameters c^. So, the model is 
expressed in parametric form by the estimates on. For details about linear 
regression, see, e.g., J|, 
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If no obvious choice for a function is at hand, it is desirable to fit the non- 
parametric model 

M 

u = c + Y / fi{u i ) + e. ( 3 ) 

i=l 

Given some data, estimates fi must result. With this kind of modeling no prior 
knowledge about functional dependences is needed as input; one can find very 
general functional forms. A disadvantage is the absence of even more general 
terms, like fi{Ui, Uj), j ^ i. The reason for this lack is a practical one: as 
will become clear below, any implementation finding functions of d variables 
has to compete with the curse of dimensionality. For the generalized additive 
model (^), estimates for the functions /, are found by the backfitting algorithm 
|i} H 0| . This is an iterative procedure, working by the following rules: 

1. Initialization: C = E(U ), ft = f°,i= 1, -, M. 

2. Iteration: for i = 1, ...,M calculate 

fi = Elu -C-J2fk(U k )\U k 

\ kjH 

until convergence, 

where EQ denotes the expectation value and the f® are some appropriate initial 
settings. In applications, one has to obtain the functions from data as realiza- 
tions of the generating process. Then, this operator has to be replaced by its 
estimator. In each iteration step we estimate the functions by fiiUi) = Si(-\Ui), 
a smoothing operator which returns a function dependent on C7j. 

The discussion of properties and choice of the smoother is a subtle issue 
and lies far beyond the scope of this paper, a detailed discussion is given in 
[|| . The examples presented in this article have been obtained using the simple 
running mean smoother (moving average) . This is not the best choice for many 
problems, e.g., points at the boundaries have to be considered with care or 
must be cut - throughout this article, there have been enough data to afford 
for the luxury to cut the boundaries, in spatio-temporal systems there are often 
many thousand of data points. Smoothing splines are good smoothers in many 
situations due to their differentiability properties |^|, g]. In any case, a careful 
check should be undertaken for a concrete set of data. 

The iteration works by adjusting only for one function, subtracting all the 
others from the estimation of Uq, i.e. one smoothes the partial residual (Uo — 
Sfe fk) against C/j. In spatiotemporal data analysis, on is faced with the task of 
finding equations of motion from data. These can be coupled map lattices, a set 
of ordinary coupled differential equations, partial differential equations or even 
more complicated models. In many of these formulae, the lhs. is linear (e.g. 
a time derivative) and Eq. ([}]) constitutes an appropriate additive, nonlinear 
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model. If the measured data are nonlinear transforms through a measuring 
function, it can be desirable to model a nonlinear lhs, too: 



fo(U ) = C + ^2f i (U i ) + e, (4) 



j=i 



The constant can be absorbed w.l.o.g. into the functions Now, the model 
is more symmetrical and indeed, in physics, often situations without a clear 
predictor (lhs) - response (rhs) structure are found. A typical example is pat- 
tern formation where one is interested in the nonlinear interaction of stationary 
states. A priori it is not clear which variables constitute the state space nor is 
it always possible to measure them. Data analysis provides then an estimation 
of the model as a nonlinear, possibly noninvertible, transformation of the mea- 
sured data. As an example, take the relation Uq = G(U\ + U2) with G some 
nonlinear invertible function. The inverted equation C? _1 ([/o) = U1 + U2 cannot 
be found with (^), but with the model ([5]). An example for this case is given 
below. 

An algorithm to solve (^) is ACE, (the Alternating Conditional Expectation 
value algorithm) |)| , it works by minimizing the squared error 



E 



A I 



(5) 



The method shall be explained stepping from the one-dimensional estimation 
over two dimensional to the M-dimensional problem. 

1) U = fi(U 1 )+e. 

For this simple model, one obtains from Eq. (^) 

EiUo-fiiU^] 2 =min, (6) 

where minimization is achieved by variation of f\ in the space of measurable 
functions [|| . After some elementary steps one finds the solution 

fi(Ui) = E{U \Ux) . (7) 

For applications, again a smoother is used as estimator of the expectation value 
operator. The result of the smoothing on a set of numerically produced data 
is shown for the example Uq = U\ in Fig. |l|. The data has been generated by 
first drawing 10 000 equally distributed random numbers for u\ in the interval 
(—3, 3). These numbers have been squared to yield the values for uq, after this 
Gaussian noise N(0, 0.1 2 ) has been added. The bandwidth of the running mean 
smoother has been set here and below to 200 points. In Fig. |l|, the scatterplot of 
the points (u±, uq) is shown (grey) together with the estimate for the function /1, 
found by Eq. (Q) (black, thick line). As indication for the estimation error the 
pointwise standard deviation has been calculated from the estimated function 
and the residuals (upper and lower black lines). 
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Figure 1: One step transformation in the case f Q = Uo, f\{U\) = U\- The 
scatterplot of points (wo,ui) is shown by grey dots (not labelled). Applying the 
running mean smoother yields an estimation for the function /i (middle, thick 
black line), above and below is plotted twice the pointwise standard deviation 
(upper and lower black lines). 
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2) MU ) = h(U 1 ) + e. 

In this case, the solution is found iteratively: 

1. Initialization: f {U ) = [U - E{U )} /y/var(U ). 

2. Computation of /i(t/i) = E[f a (U )\Ui] 

3. Computation of f (U ) = £[/i(Z7i)|t7 ] 

4. Normalization: / (C/ ) = fo{U )/^var(f (U )). 

5. Iteration of 2)-4) until convergence. 

A smoother with nonzero bandwidth is contracting and thus the trivial solution 
is excluded by normalization (point 4) with the variance var(fo(Uo)). 

In order to show how ACE handles non-invertible relations, data (Acos9, 
A sin 9) have been generated, with 9 = (p + e v and tp an equally distributed ran- 
dom number in [0, 2ir) and A = 1 + The noise e v , €a is Gaussian distributed, 
with N(0, 0.05 2 ). This corresponds to the noisy measurement of an amplitude 
and a phase with a nonlinear, noninvertible measurement transformation. 

Without noise, one has Uq = cos(ip), Ui — s'm((p) and thus U§ = 1 — Iff. 
With noise, the relation reads to first order U§ — 2eA Uq = 1 — + 2€a U i . This 
means that multiplicative noise enters in predictor and response, touching the 
errors in variables problem which is relevant for many applications. The strong, 
multiplicative transformation of the noise is mirrored directly in the results. 

In Fig. U a) the data are displayed in a scatterplot, in Fig. |^ b) and c), the 
resulting functions are shown on top of the respective residuals together with 
twice the pointwise standard deviation as an error indicator. Due to the non- 
linear noise transformation the functions are distorted. The pointwise standard 
deviation is less significant as error indicator due to the very asymmetric local 
distribution of values, to be seen in the upper and lower curves of Fig. ^ b) and 
c). In this case asymmetric measures should be calculated. The above example 
underlines a conceptual problem when one is faced with errors in the measure- 
ment variables. The general treatment of this problem is not simple, for more 
details see [l^] and references therein. 

Even though the precise estimation of the functional dependence is not pos- 
sible in the above case, one finds an approximation which would not be so bad 
for many experiments. Please note that linear tools would not be able to find 
this relation easily, rather one should undergo a trial-and-error procedure until 
a guess for the right form of the functions is found. Some problems with a 
non-Gaussian distribution, however, remain in linear regression, too. 

3) MU ) = EtiiMU l )+e 

This equation defines a (noisy) hypersurface in M-dimensional space (in two 
dimensions, one finds a line). The functions fall on a hyper plane: the al- 
gorithm performs a transformation of the points {U%, . . . , Um} to the points 
{/i, . . . , /a/} such that the best plane in the mean squared sense is found. 
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Figure 2: Example for Uq = A sin 6, Ui = A cos 6, where A and 9 include 
measurement noise (a). The expected functional relation is noninvertible: Uq + 
Uf = 1, noise on A, 9 is transformed nonlinearly. The grey dots in (b), (c) show 
this effect in the residuals. This in turn influences the result in that it induces 
bending at the ends and in the center part of the results of ACE, where the 
transformation has its biggest effect /o and /i (see the middle line in (b), (c)). 
The symmetric pointwise standard deviation is not a good indicator for the 
asymmetry (upper and lower lines in (b), (c)), according asymmetric measures 
should be calculated. Functions arc shifted to zero mean. 
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The modification in the algorithm has to be done in the steps 2. and 3. 
above. They are replaced by 

2. Computation of /<(!/<) = E[f (U ) - J2k=£i h(U k )\Ui] by backfitting. 

3. Computation of f (U ) = fi{Ui) \ U ] 

This shows that the ACE algorithm is based on backfitting. To demonstrate a 
higher-dimensional example, the relation Uq = exp(Lq + E/f) has been chosen. 
The variables have been generated using equally distributed values for U\ , U2 
in the interval (—2,2). From the resulting numbers, uq has been calculated, 
thus the smallest possible value of uq is exp(— 2) ~ 0.14, the largest is exp(6) ~ 
400. After this procedure (in contrast to the previous example), Gaussian, 
7V(0,0.2 2 ), noise has been added to Uj, i = 0,1,2. This avoids the problems 
with transformation of measurement noise, discussed above. But at this place, 
another interesting question for the modeler is raised instead: what happens if 
one tries to include some additionally measured variable in the model? 

As an example, one can imagine a measurement of four variables, where it 
is not clear a priori which variables are needed in a minimal model. As a first 
guess, a generalized additive model /o(E/o) = fi(U%) + fiiU-i) + ^2(^3) will be 
tested. This situation shall be now related to the data generated above: an 
additional, equally distributed variable in the interval (0, 100), uncorrelated to 
Uq, Ui, U2, is generated and fed as fourth variable into ACE. 

The three-variable additive model is fo(Uo) — fi{U\) + fiiJJ?) and one 
expects ACE to find the estimates f (U ) = ln(E/ ) , A(f/i) = Ui , h{U 2 ) = L/f 
due to the additive structure. For the four dimensional model, one expects an 
unchanged result for /o, fi, ji plus a zero function f 3(1/3) = in the additional 
variable. 

In Fig. ||, the 3D scatterplot of the generated relevant data is shown by the 
points (ln([/o), E/i, U2), the Uq-qxis is logarithmic. A two dimensional represen- 
tation is not able to display the full relation, if one plots, e.g., Uq vs. U% (Fig. |J), 
no functional dependence is cognizable. The transformations (/oj /ij /2> /a) 
found by ACE are shown in Fig. |^ as functions of their arguments. The function 
fo is found to be logarithmic, / = 0.5 ln(u ) +c, f\ is linear with a slope of 0.5, 
and fo, — au\ + c is quadratic also with a factor a — 0.5. The factor 0.5 reflects 
possible ambiguities in the result, as well as the addition of the constant, sec 



Sec. 3.2 



After multiplying the results by 2 and adding a constant to fo and /2 one 
can speak of a good coincidence of expected and found result. The additional 
variable 113 has no dependence on the others and yields /3 = O(10~ 2 ) ~ 0. 
Thus, the method is able to identify uncorrelated terms. A comparison of runs 
with and without the additional third term on the rhs showed that the functions 
are virtually unchanged, confirming the stability of the algorithm. Despite the 
seemingly convincing result it must be noted that results can be misleading if 
there exist correlations between an additional term and "true" variables. If the 
task is to find a minimal model, one has to consider this effect. The connection 



of correlation with ACE is discussed in Sec. 3.3 
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Figure 3: Three dimensional plot of the data from Uo = cxp([/i + C/f ) + e. 



a b 




Figure 4: Two dimensional scatter plots of a part of the data as projection 
on the plane. The examples uo , u\ and u 2 , u 3 are shown, even "guessing" a 
logarithmic relation for u , no clear relation can be read on the left, on the 
right, the data are completely random according to the zero correlation of 1*3 
with the rest. 




Figure 5: Transformations for the given data with twice the pointwise standard 
errors (black lines) on top of the residuals (grey). The logarithm /o(uo) is found 
over three decades to hit numerical restrictions at the smallest points. The linear 
and square functions f± and fi are found very well. The variable U3 is found to 
be "unimportant" with fa(v,3) = 0. 



3 Some further topics 

3.1 Smoothers and the bias- variance dilemma 

As mentioned above, the choice of the smoother can be delicate. Typically, 
to each smoother belongs a smoothing parameter which characterizes its band- 
width (window width). For the running mean smoother, one obtains for problem 

(0) 

h{U u ) = l/N l u **> ( 8 ) 

where the hat denotes estimation and B(Uo y i) is a neighborhood of the point 
Uqj. Increasing bandwidth enlarges the neighborhood and thus the number Ni 
of data points in B. The respective expectation and variance are 

E{h{Ui,i)) = 1/JV, ]T f(U 0d ), (9) 
jeB(u ,i) 

var{h{U u )) = a 2 /N t , (10) 

where 07 = a is assumed for simplicity. Thus, the variance shrinks with increas- 
ing bandwidth; the bias, however, increases because more terms different from 
f(Uo,i) are involved. If one wants to find the optimum choice of the bandwidth 
in order to have neither large bias nor large variance, one uses measures like the 
mean-squared error or the average predictive squared error; these are evaluated 
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using cross-validation and yield a criterion for the bandwidth to be chosen. For 
more information it is referred to the literature |2| [To| |. 

The iteration procedure should run until convergence. In fact, convergence 
is not guaranteed for all classes of smoothers, asymptotic properties and conver- 
gence is discussed in a rigorous way in B & |l^] . Intuitively, it is clear that the 
iteration should stop when the errors on the functions fall below the noise e in 
the model (f|). Then it is a question how to find properly the respective errors. 
There are different ways to estimate the error consistently, depending on the 
smoother and the error models used. Locally, one can choose twice the point- 
wise standard error as a characterization, global confidence criteria are harder 
to derive |ll| , an elegant and modern approach is given by the Bayesian for- 
mulation of the backfitting algorithm jlij. In the case of errors in variables, 
care has to be taken to calculate the error correctly, a Bayesian approach is 
appealing in that case. In the examples, the most naive criterion, i.e., twice the 
pointwise standard error has been used. 



3.2 Peculiarities of ACE 

The ACE algorithm can be related to other algorithms, namely canonical vari- 
ates and alternating least squares methods. Special properties of all three meth- 
ods are discussed in |lf| , peculiarities of ACE are discussed in [^|, [| , here a few 
items shall be given without a claim for completeness. 

i) ACE is symmetric in the predictor and the response, which is quite unusual 
for a regression tool, but suits well several setups in physical applications. 

ii) Transformations of the variables are not reproduced by ACE. Take, e.g., the 
case Uq = fi{U\) + e. If one applies a transformation to each of the sides, ACE 
does not necessarily find fi again. This reflects parts of the nature of the in- 
verse problem, which is in general not uniquely solvable. Especially, adding a 
constant to either of the sides in the model is ambiguous in ACE. 

iii) Different distributions in the model constituents can "distort" the resulting 
functions. E.g., it may happen that one term in a model with two variables is 
Gaussian distributed and another one is equally distributed in some interval. 
Then, the Gaussian variable is bent at the ends (cf. Fig. ph. 



3.3 Maximal Correlation 

The ACE algorithm can be regarded as a regression tool, but there exists another 
possible interpretation. Instead of solving the least squares error minimization 
problem 

M 

E[fo(U ) - MM 2 = min (11) 

i=l 

one can reformulate the above to 

* = corr ^f (U ), J2 M U i^J = max ' ( 12 ) 
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where corr denotes the correlation function. This reflects the principle of the 
maximal correlation ge| jl7|, ^ The functions that fulfill Eq. @ are 

called optimal (in the sense of correlation) transformations, found by the ACE 
procedure. To have a global measure for the importance of a single term fi(Ui) , 
there are several possibilities, e.g., one can choose the overall variance, var(fi), 
but this does not reflect correlation with other terms and a noise term with 
large variance would be judged to be important. Here, the author suggests to 
use Eq. ( |l2|) in a symmetric way, 



(13) 



to characterize the importance of a single term. The use of this measure is 
demonstrated for the data from the example used for Fig. [|. The values = 
0.9779, *i = 0.9272, * 2 = 0.9690 and # 3 = 0.0426 are found, confirming again 
the unimportance of the additional uncorrelated variable, this time by the global 
measure ^3. 

If however, the additional variable is correlated, e.g., by passively following 
another variable, a high correlation will follow, yielding a model with more 
components. In the case of spatiotemporal modeling one has to be specially 
careful, because one is interested in minimal (in the number of terms) models, 
but dynamical dependencies can result easily in correlations, like in the case of 
slaved variables. 



3.4 Generalizations 

One can generalize the procedure to include more complex, non-additive cou- 
plings, e.g. in the three dimensional case, one considers the model 

U = F{U u U 2 ) + e, (14) 

with F some function, describing a two dimensional surface in a three dimen- 
sional space spanned by U 0l U\, U 2 . Like for Eq. (]?]), one finds 

F(U U U 2 ) = E(Uo\U x ,U 2 ) . (15) 

Going to higher dimensions requires in general more data due to the curse of 
dimensionality, additionally one has to decide again which estimator to use for 
correct results. 

As an example, the relation Uo — U\ ■ U 2 (Fig. ^a) is used. Ten thousand 
equally distributed data points in the range (—1, 1) for U\, U 2 have been gener- 
ated, from these Uo is obtained by multiplication. Gaussian, N(0, 0.1 2 , noise has 
been added afterwards to each of the three variables as noise realizations. The 
estimated function coincides well with the expectation (Fig. ^ja). Note that in 
this case ACE yields as a result log Uo = log U\ + log U 2 because of the additive 
structure of the model (Na) 
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Figure 6: Left: noisy data for the surface Uq = U\ ■ U2, a — 0.05 for all Ui. 
Right: Optimal two dimensional transformation. The result is very good, the 
CEV smoother is able to accurately find the hyperbolic profile. 



For this model, many applications with quasi-two-dimensional setting can be 
found, e.g., in geological and hydrodynamical systems (e.g. the two dimensional 
Euler equations). 

4 Application to spatiotemporal data 

Typical spatiotemporal models of interest for a physicist are coupled map (CML) 
systems , coupled ordinary differential equations (ODE's) or partial differential 
equations (PDE's). In all these models, derivatives or finite differences are 
involved. Here, the measured quantity shall consist of a space-time record, given 
by a movie, i.e. a series of consecutive pictures. From the time series in one point 
one builds finite differences in time, from the pictures one calculates differences 
in space. This procedure is numerically not trivial: addition (or subtraction, 
respectively) is numerically bad conditioned and can yield extinction of digits, 
measurement noise is then shifted to the first digits. Basically, one must use the 
apparatus known from numerical integration [js], ^0|, ^| to use correct sampling 
in space and time. In the case of a periodic setup, accuracy loss can be limited 
by using spectral methods ||, |2^] . 

As an example, let us take the Complex Ginzburg-Landau Equation. It 
describes a well pattern formating system above onset p3] : 

d t A - (1 + i Cl )AA + (1 - ic 3 )h{\A\ 2 )A . (16) 

The function /1 is a complex nonlinear function. The task of data analysis 
-supposed that A can be measured- is to determine the nonlinearity f±. To 
apply the ideas of backfitting we put Uo(x,t) — dtA/ A, Ui(x,t) = AA/A, 
U2(x,t) = \A\ 2 . Real and imaginary part must be treated separately. Using 
the iteration procedure, one finally obtains estimates for the parameters c\, C3 
and the full function j\. In a slightly more complicated setup, this has been 
performed in j24j|. In fact, for this example the numerically critical step is rather 
the correct evaluation of derivatives than the application of backfitting. 



13 



If one wants to find a model, but there are no physical arguments which 
derivatives should occur or it is even unclear if the measured quantities are 
state variables or have undergone some measurement transformations, one can 
still try to find a generalized additive model. In that case, it can be worth to 
fit a discrete model (CML), that represents the system dynamics, avoiding a 
good part of the trouble with finite differences (one can choose mapping units 
big enough to avoid numerical problems). 

A recently presented experiment considers underwater vortex ripples [^5], |2(i| . 
The ripple formation is driven by a periodical motion of water over a sand bed 
with given amplitude and frequency. A picture of the ripples -viewed from 
the side- is given in Fig. a), the resulting space time plot is shown in Fig. [?| 
b). In this paper, experimental details are not given, for those it is referred to 
the literature |26|, ^7j. But the treatment of the data and subsequent results 
are presented to demonstrate application of the nonparametric regression. The 

a) 



b) 
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Figure 7: a) Ripple profile as observed in the experiment. The driving amplitude 
a is marked. With the bottom at rest, water is oscillating forth and back over 
the crests. Triangles (top, black line) are fitted to raw picture (lower, dark 
profile) to determine the crests, b) The experimental evolution of the position 
of the ripple crests starting from ripples with a small length yields the shown 
space time plot. 
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model is formed by coarse graining of space and time. In time, one uses a 
mapping from period to period, in space, ripples are characterized by their 
length A. The interaction between ripples is modeled by a nonlinear interaction 
function which must be concave by stability analysis. The model is obtained 
for ripples close to the final length; without discussing further the physical 
mechanisms, it reads: 

A s: (T + 1) - A((T) = — /(Aj_i(T)) + f(Xi(T)) - /(WT)) + e • (17) 

The subscript i denotes the space index. 

The process of data analysis involved the following steps: 

i) calculating the ripple length, this has been done by fitting triangles to the 
raw profiles and calculating the distances from crest to crest (cf. Fig. [?]. The 
accuracy amounted to two digits. 

ii) Calculation of the rhs of Eq. (jl7|), this has been done by first calculating 
the difference, the result has been very noisy due to the numerical extinction of 
digits. To compensate for this effect, the differences have been smoothed and 
small numbers have been discarded. Please note that no "unsuitable" points 
have been excluded (a common, but often wrong technique for outliers) , rather 
it is a numerical requirement. In any case, the model has been developed from 
considerations of long ripples not too far from the final length and thus very 
small ripples are not expected to follow the model dynamics. 

iii) To apply a backfitting algorithm, one has to identify the variables E/j as they 
have been denoted in the previous sections. We put formally At A = \{T + 

1) - A-t(T) = U , Xi-i(T) = U u Xi(T) = U 2 , and A i+ i(T) = U 3 . The lhs 
shall be linear, so the model (||) has been used. For this example, the functions 
are denoted by fx = fi, f% = / c , fa = f r , for the left, center and right one, 
respectively. In the present context this allows easier interpretation in terms of 
the spatial structure. The data under consideration have been obtained by nine 
experimental runs, each between 1000 and 6000 data points. 

The experimental runs have been stationary as far as experimental tech- 
nique is concerned. Each data point (A t A,(T), Aj_x(T), Aj(T), Aj + i(T)) can be 
considered as an independent realization of the underlying process. Thus, the 
data from all nine measurements have been analyzed together. The result is 
shown in Fig. ^. For each function, the importance criterion is calculated with 
*o = 0.732, *i = 0.560, * 2 = 0.8604 and * 3 = 0.6376. From these numbers 
one obtains basically two informations: 1) There is a large part of variance 
which cannot be modeled. This can have two reasons, the first is measurement 
noise, which has been reduced as much as possible, the second is that parts of 
the dynamical equations (higher order interactions or similar) are not included 
in the model. A priori it is not easy (or even impossible) to distinguish which 
is the reason, again the error in variables problem has to be treated with care. 

2) The growth of a ripple depends mainly on the length of this ripple, but the 
neighbors cannot be neglected. This is seen by the lower correlations for the 
neighboring functions. 

As a basic test, the result functions must show the expected concave shape. 
This is perfectly true in the range the model should hold. There are, however 
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Figure 8: Plot of the functions fi, f c , f r with pointwise standard deviations 
(black lines) on top of the partial residuals (grey dots). The left and right 
functions show wide scatter, reflected in the individual correlations (numbers 
are given in the text). The curves display the expected convex shape. For small 
ripple length there is a systematic deviation in the left and right function. 



differences between the neighbor functions f r and the center one, f c . Mainly 
one notes a very large standard deviation at certain regions of f r and a 
systematic deviation, a bend, at small ripple lengths. At this point the data 
analysis result gives feedback to the experiment and the theoretical modeling. 
It must be clarified if these differences have a physical background or if they are 
artificial effects due to the measurement procedure. This is ongoing work. 

In Fig. ^ the left, center and right function are plotted together. Encour- 
agingly, with respect to the established model, the functions fall close together 
except for small ripple lengths. The model has been developed in the region 
close to the final length, thus the result confirms the model. For small lengths, 
the result should be considered as input for further modeling. In the case of 
small ripple lengths, the deviations of the side functions to the center indicate 
that in the small ripple length region, a difference between center and sides ex- 
ists. If one averages over left, center and right function, assuming that all three 
functions are equal and differences arise from measurement errors one obtains 
the thick black line in Fig. [)| 

The model has been obtained as a great simplification of the highly complex 
granular-fluid system. Given additionally a certain amount of measurement 
and numerical errors from data processing, one can say that the model is well 



16 




-1.5 - 



2 l i l i I i l i l i l i l 

2 3 4 5 6 7 8 

X 

Figure 9: Results from averaging over different runs (/;, f c , f r ). For long ripple 
lengths -the regime, the model has been developed for- one observes a very good 
coincidence. Theoretically, a difference between sides and center is not expected; 
assuming them to be equal, averaging yields the black line as an estimation for 
the true interaction function. 



confirmed by direct, nonparametric data analysis. Further activity aims at a 
more general modeling in the sense of Eq. (|lj) to check if there are differences 
in the interaction functions of the sides and the center which is theoretically 
very improbable. The direct justification of the model ( |l7|) is a success not only 
for the modeler but as well for statistical data analysis in physical experiments. 



5 Discussion and conclusion 

Some of the facts about generalized additive models, backfitting algorithms and 
ACE have been presented. With a step by step explanation the basic ideas of 
algorithms and some technical details have been discussed. In the examples, 
problems with nonlinear transformation of data have been pointed out and the 
effect of additional, uncorrelated variables has been investigated. With data 
from an example for a complex pattern forming system, it has been shown how to 
apply the algorithms to spatio-temporal data and explained the main difficulties 
for applications. With data analysis a simple model has been validated, which 
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has been developed by physical arguments and coarse graining of the highly 
complex system of underwater vortex ripples. 

Generalized additive models occur often in physical and engineering appli- 
cations. The idea to obtain models directly from data can be realized with the 
techniques developed in statistical science. The focus of the presented methods 
lies in the possibility to determine nonlinearities. Results from the nonparamet- 
ric data analysis help the theoretician to improve analytical models as well as 
the applied physicist or engineer who just needs a working model to represent 
the system dynamics. These aspects can be closed to a circuit - measurement 
- data analysis - theoretical modeling - which can be iterated until a consistent 
model for the system under consideration is obtained. It can be expected that 
the use of nonparametric methods is helpful as well in situations where little 
is known about the system. There, working models can be inferred without 
further theoretical justification but as tools 

While in this article additive models have been, there exist techniques to 
solve for either additive or multiplicative models, e.g. the marginal integration 
[ p8| or alternatively Breimans II-method |^9| (without a claim for completeness). 
Implementations of the backfitting procedure or the ACE algorithm exist for the 
S-PLUS language or the public domain clone R-PLUS, or in the worldwide web 

& 

Statistical approaches are promising tools for the analysis of complex sys- 
tems, since nonlinearities are automatically fetched. Bayesian formulations 
promise to yield better error models and another, future application can be 
stochastic modeling, finding a representation of the residuals in form of a noise 
term with measured probability distribution. 
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